ChIPWig: a random access-enabling lossless and lossy compression method for ChIP-seq data
نویسندگان
چکیده
Motivation Chromatin immunoprecipitation sequencing (ChIP-seq) experiments are inexpensive and time-efficient, and result in massive datasets that introduce significant storage and maintenance challenges. To address the resulting Big Data problems, we propose a lossless and lossy compression framework specifically designed for ChIP-seq Wig data, termed ChIPWig. ChIPWig enables random access, summary statistics lookups and it is based on the asymptotic theory of optimal point density design for nonuniform quantizers. Results We tested the ChIPWig compressor on 10 ChIP-seq datasets generated by the ENCODE consortium. On average, lossless ChIPWig reduced the file sizes to merely 6% of the original, and offered 6-fold compression rate improvement compared to bigWig. The lossy feature further reduced file sizes 2-fold compared to the lossless mode, with little or no effects on peak calling and motif discovery using specialized NarrowPeaks methods. The compression and decompression speed rates are of the order of 0.2 sec/MB using general purpose computers. Availability and implementation The source code and binaries are freely available for download at https://github.com/vidarmehr/ChIPWig-v2, implemented in C ++. Contact [email protected]. Supplementary information Supplementary data are available at Bioinformatics online.
منابع مشابه
Compressing the Incompressible with ISABELA: In-situ Reduction of Spatio-temporal Data
Modern large-scale scientific simulations running on HPC systems generate data in the order of terabytes during a single run. To lessen the I/O load during a simulation run, scientists are forced to capture data infrequently, thereby making data collection an inherently lossy process. Yet, lossless compression techniques are hardly suitable for scientific data due to its inherently random natur...
متن کاملSpeech Data Compression for Embedded Systems
The main concern of this paper is speech data compression for low-cost embedded systems such as voice-related toys or devices with interactive sound-responses. We use a PC to generate and compress 8-bit-speech-data that has various features such as human speech, symphony and animal songs; the compressed data are then transferred to a masked-ROM. An Intel 8051 embedded chip is employed to expand...
متن کاملProgressive Compression of Visibility Data for View-dependent Multiresolution Meshes
In this paper we present a lossless and optionally lossy compression method for precomputed visibility data for view-dependent multiresolution meshes, which supports out-of-core rendering and progressive transmission through slow connections. Our approach has the feature, that visibility information can be stored directly in the nodes of the multiresolution structure and only necessary parts of...
متن کاملTransformations for the compression of FASTQ quality scores of next-generation sequencing data
MOTIVATION The growth of next-generation sequencing means that more effective and efficient archiving methods are needed to store the generated data for public dissemination and in anticipation of more mature analytical methods later. This article examines methods for compressing the quality score component of the data to partly address this problem. RESULTS We compare several compression pol...
متن کاملفشردهسازی تصویر با کمک حذف و کدگذاری هوشمندانه اطلاعات تصویر و بازسازی آن با استفاده از الگوریتم های ترمیم تصویر
Compression can be done by lossy or lossless methods. The lossy methods have been used more widely than the lossless compression. Although, many methods for image compression have been proposed yet, the methods using intelligent skipping proper to the visual models has not been considered in the literature. Image inpainting refers to the application of sophisticated algorithms to replace lost o...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Bioinformatics
دوره 34 6 شماره
صفحات -
تاریخ انتشار 2018